{ "cells": [ { "cell_type": "markdown", "id": "8lwheJQ8RxAw", "metadata": { "id": "8lwheJQ8RxAw" }, "source": [ "### **3. X-learner**\n", "Next, let's introduce the X-learner. As a combination of S-learner and T-learner, the X-learner can use information from the control(treatment) group to derive better estimators for the treatment(control) group, which is provably more efficient than the above two.\n", "\n", "The algorithm of X learner can be summarized as the following steps:\n", "\n", "\n", "**Step 1:** Estimate $\\mu_0(s)$ and $\\mu_1(s)$ separately with any regression algorithms or supervised machine learning methods (same as T-learner);\n", "\n", "\n", "**Step 2:** Obtain the imputed treatment effects for individuals\n", "\\begin{equation*}\n", "\\tilde{\\Delta}_i^1:=R_i^1-\\hat\\mu_0(S_i^1), \\quad \\tilde{\\Delta}_i^0:=\\hat\\mu_1(S_i^0)-R_i^0.\n", "\\end{equation*}\n", "\n", "**Step 3:** Fit the imputed treatment effects to obtain $\\hat\\tau_1(s):=\\mathbb{E}[\\tilde{\\Delta}_i^1|S=s]$ and $\\hat\\tau_0(s):=\\mathbb{E}[\\tilde{\\Delta}_i^0|S=s]$;\n", "\n", "**Step 4:** The final HTE estimator is given by\n", "\\begin{equation*}\n", "\\hat{\\tau}_{\\text{X-learner}}(s)=g(s)\\hat\\tau_0(s)+(1-g(s))\\hat\\tau_1(s),\n", "\\end{equation*}\n", "\n", "where $g(s)$ is a weight function between $[0,1]$. A possible way is to use the propensity score model as an estimate of $g(s)$." ] }, { "cell_type": "code", "execution_count": 2, "id": "eRpP5k9MBtzO", "metadata": { "id": "eRpP5k9MBtzO" }, "outputs": [], "source": [ "# import related packages\n", "import numpy as np\n", "import pandas as pd\n", "from matplotlib import pyplot as plt;\n", "from sklearn.ensemble import GradientBoostingRegressor\n", "from sklearn.linear_model import LinearRegression\n", "from causaldm.learners.CEL.Single_Stage import _env_getdata_CEL" ] }, { "cell_type": "markdown", "id": "XUu695Qrf61-", "metadata": { "id": "XUu695Qrf61-" }, "source": [ "### MovieLens Data" ] }, { "cell_type": "code", "execution_count": 3, "id": "JhfJntzcVVy2", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 424 }, "executionInfo": { "elapsed": 288, "status": "ok", "timestamp": 1676750101543, "user": { "displayName": "Yang Xu", "userId": "12270366590264264299" }, "user_tz": 300 }, "id": "JhfJntzcVVy2", "outputId": "7fab8a7a-7cd9-445c-a005-9a6d1994a071" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
user_idmovie_idratingageDramaSci-Figender_Moccupation_academic/educatoroccupation_college/grad studentoccupation_executive/managerialoccupation_otheroccupation_technician/engineer
048.01193.04.025.01.00.01.00.01.00.00.00.0
148.0919.04.025.01.00.01.00.01.00.00.00.0
248.0527.05.025.01.00.01.00.01.00.00.00.0
348.01721.04.025.01.00.01.00.01.00.00.00.0
448.0150.04.025.01.00.01.00.01.00.00.00.0
.......................................
656375878.03300.02.025.00.01.00.00.00.00.01.00.0
656385878.01391.01.025.00.01.00.00.00.00.01.00.0
656395878.0185.04.025.00.01.00.00.00.00.01.00.0
656405878.02232.01.025.00.01.00.00.00.00.01.00.0
656415878.0426.03.025.00.01.00.00.00.00.01.00.0
\n", "

65642 rows × 12 columns

\n", "
" ], "text/plain": [ " user_id movie_id rating age Drama Sci-Fi gender_M \\\n", "0 48.0 1193.0 4.0 25.0 1.0 0.0 1.0 \n", "1 48.0 919.0 4.0 25.0 1.0 0.0 1.0 \n", "2 48.0 527.0 5.0 25.0 1.0 0.0 1.0 \n", "3 48.0 1721.0 4.0 25.0 1.0 0.0 1.0 \n", "4 48.0 150.0 4.0 25.0 1.0 0.0 1.0 \n", "... ... ... ... ... ... ... ... \n", "65637 5878.0 3300.0 2.0 25.0 0.0 1.0 0.0 \n", "65638 5878.0 1391.0 1.0 25.0 0.0 1.0 0.0 \n", "65639 5878.0 185.0 4.0 25.0 0.0 1.0 0.0 \n", "65640 5878.0 2232.0 1.0 25.0 0.0 1.0 0.0 \n", "65641 5878.0 426.0 3.0 25.0 0.0 1.0 0.0 \n", "\n", " occupation_academic/educator occupation_college/grad student \\\n", "0 0.0 1.0 \n", "1 0.0 1.0 \n", "2 0.0 1.0 \n", "3 0.0 1.0 \n", "4 0.0 1.0 \n", "... ... ... \n", "65637 0.0 0.0 \n", "65638 0.0 0.0 \n", "65639 0.0 0.0 \n", "65640 0.0 0.0 \n", "65641 0.0 0.0 \n", "\n", " occupation_executive/managerial occupation_other \\\n", "0 0.0 0.0 \n", "1 0.0 0.0 \n", "2 0.0 0.0 \n", "3 0.0 0.0 \n", "4 0.0 0.0 \n", "... ... ... \n", "65637 0.0 1.0 \n", "65638 0.0 1.0 \n", "65639 0.0 1.0 \n", "65640 0.0 1.0 \n", "65641 0.0 1.0 \n", "\n", " occupation_technician/engineer \n", "0 0.0 \n", "1 0.0 \n", "2 0.0 \n", "3 0.0 \n", "4 0.0 \n", "... ... \n", "65637 0.0 \n", "65638 0.0 \n", "65639 0.0 \n", "65640 0.0 \n", "65641 0.0 \n", "\n", "[65642 rows x 12 columns]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get the MovieLens data\n", "MovieLens_CEL = _env_getdata_CEL.get_movielens_CEL()\n", "MovieLens_CEL.pop(MovieLens_CEL.columns[0])\n", "MovieLens_CEL = MovieLens_CEL[MovieLens_CEL.columns.drop(['Comedy','Action', 'Thriller'])]\n", "MovieLens_CEL" ] }, { "cell_type": "code", "execution_count": 4, "id": "J__3Ozs7Uxxs", "metadata": { "id": "J__3Ozs7Uxxs" }, "outputs": [], "source": [ "n = len(MovieLens_CEL)\n", "userinfo_index = np.array([3,6,7,8,9,10,11])\n", "SandA = MovieLens_CEL.iloc[:, np.array([3,4,6,7,8,9,10,11])]" ] }, { "cell_type": "code", "execution_count": 5, "id": "sfb-mplOP9HJ", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 268, "status": "ok", "timestamp": 1676750607995, "user": { "displayName": "Yang Xu", "userId": "12270366590264264299" }, "user_tz": 300 }, "id": "sfb-mplOP9HJ", "outputId": "c2e23b5e-3be2-4bcc-ac0e-c6622021f841" }, "outputs": [ { "data": { "text/plain": [ "GradientBoostingRegressor()" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Step 1: Fit two models under treatment and control separately, same as T-learner\n", "\n", "import numpy as np\n", "mu0 = GradientBoostingRegressor(max_depth=3)\n", "mu1 = GradientBoostingRegressor(max_depth=3)\n", "\n", "S_T0 = MovieLens_CEL.iloc[np.where(MovieLens_CEL['Drama']==0)[0],userinfo_index]\n", "S_T1 = MovieLens_CEL.iloc[np.where(MovieLens_CEL['Drama']==1)[0],userinfo_index]\n", "R_T0 = MovieLens_CEL.iloc[np.where(MovieLens_CEL['Drama']==0)[0],2] \n", "R_T1 = MovieLens_CEL.iloc[np.where(MovieLens_CEL['Drama']==1)[0],2] \n", "\n", "mu0.fit(S_T0, R_T0)\n", "mu1.fit(S_T1, R_T1)\n" ] }, { "cell_type": "code", "execution_count": 6, "id": "zb42ZMw3pkqm", "metadata": { "id": "zb42ZMw3pkqm" }, "outputs": [], "source": [ "# Step 2: impute the potential outcomes that are unobserved in original data\n", "\n", "n_T0 = len(R_T0)\n", "n_T1 = len(R_T1)\n", "\n", "Delta0 = mu1.predict(S_T0) - R_T0\n", "Delta1 = R_T1 - mu0.predict(S_T1) " ] }, { "cell_type": "code", "execution_count": 7, "id": "pxYLjE0Ar2_5", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 352, "status": "ok", "timestamp": 1676750611506, "user": { "displayName": "Yang Xu", "userId": "12270366590264264299" }, "user_tz": 300 }, "id": "pxYLjE0Ar2_5", "outputId": "bdb4d448-82ae-414c-9e8c-8bb374583691" }, "outputs": [ { "data": { "text/plain": [ "GradientBoostingRegressor(max_depth=2)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Step 3: Fit tau_1(s) and tau_0(s)\n", "\n", "tau0 = GradientBoostingRegressor(max_depth=2)\n", "tau1 = GradientBoostingRegressor(max_depth=2)\n", "\n", "tau0.fit(S_T0, Delta0)\n", "tau1.fit(S_T1, Delta1)" ] }, { "cell_type": "code", "execution_count": 8, "id": "LRvEZ4uluT-U", "metadata": { "id": "LRvEZ4uluT-U" }, "outputs": [], "source": [ "# Step 4: fit the propensity score model $\\hat{g}(s)$ and obtain the final HTE estimator by taking weighted average of tau0 and tau1\n", "from sklearn.linear_model import LogisticRegression \n", "\n", "from sklearn.ensemble import GradientBoostingRegressor\n", "g = LogisticRegression()\n", "g.fit(MovieLens_CEL.iloc[:,userinfo_index],MovieLens_CEL['Drama'])\n", "\n", "HTE_X_learner = g.predict_proba(MovieLens_CEL.iloc[:,userinfo_index])[:,0]*tau0.predict(MovieLens_CEL.iloc[:,userinfo_index]) + g.predict_proba(MovieLens_CEL.iloc[:,userinfo_index])[:,1]*tau1.predict(MovieLens_CEL.iloc[:,userinfo_index])\n", "\n", "\n" ] }, { "cell_type": "markdown", "id": "FA-F8Jc_T5Lz", "metadata": { "id": "FA-F8Jc_T5Lz" }, "source": [ "Let's focus on the estimated HTEs for three randomly chosen users:" ] }, { "cell_type": "code", "execution_count": 9, "id": "GvHnTOxmT5Lz", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 318, "status": "ok", "timestamp": 1676750150517, "user": { "displayName": "Yang Xu", "userId": "12270366590264264299" }, "user_tz": 300 }, "id": "GvHnTOxmT5Lz", "outputId": "7b0b76fd-f5ac-4ab8-a3c0-188e15484fe7" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "X-learner: [0.33630057 0.31723622 0.37261498]\n" ] } ], "source": [ "print(\"X-learner: \",HTE_X_learner[np.array([0,1000,5000])])" ] }, { "cell_type": "code", "execution_count": 10, "id": "48136320", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Choosing Drama instead of Sci-Fi is expected to improve the rating of all users by 0.3566 out of 5 points.\n" ] } ], "source": [ "ATE_X_learner = np.sum(HTE_X_learner)/n\n", "print(\"Choosing Drama instead of Sci-Fi is expected to improve the rating of all users by\",round(ATE_X_learner,4), \"out of 5 points.\")" ] }, { "cell_type": "markdown", "id": "mVAZTZYTUKJ6", "metadata": { "id": "mVAZTZYTUKJ6" }, "source": [ "**Conclusion:** Same as the estimation result provided by S-learner and T-learner, people are more inclined to give higher ratings to drama than science fictions." ] }, { "cell_type": "markdown", "id": "zxpRscObJmbX", "metadata": { "id": "zxpRscObJmbX" }, "source": [ "**Note**: For more details about the meta learners, please refer to [1] as a detailed introduction of related methods." ] }, { "cell_type": "markdown", "id": "nyirbjS5JdGh", "metadata": { "id": "nyirbjS5JdGh" }, "source": [ "## References\n", "1. Kunzel, S. R., Sekhon, J. S., Bickel, P. J., and Yu, B. (2019). Metalearners for estimating heterogeneous treatment effects using machine learning. Proceedings of the national academy of sciences 116, 4156–4165.\n" ] } ], "metadata": { "colab": { "provenance": [] }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.13" } }, "nbformat": 4, "nbformat_minor": 5 }